I just spent an afternoon figuring out how to get Python 2.7 to sort some Greek (unicode) strings according to the Greek alphabet. Strangely, there wasn’t very much help for this online, so here is what I figured out if anyone else is facing the same task.

There are actually three different resources I looked at for providing the proper sort order (collation) for various languages and scripts:

I ended up choosing the last of these, the PyICU module. This is a set of Python bindings to the ICU C++ library, a powerful library for manipulating data according to the standards of different languages and regions. It took me quite a bit of poking around (and trial and error) to get the collation working, but here is the code for sorting a simple list of strings:

 from icu import Locale, Collator
 myloc = Locale('el')  # 'el' is the locale code for Greek
 col = Collator.createInstance(myloc)
 words = ['ἀγ', 'βλα', 'ὁμηρ']
 sorted_words = sorted(words, cmp=col.compare)

I actually needed to sort a list of dictionaries by comparing one of the dictionary values, while still sorting with the Greek collation. Here’s what that code looks like:

 from icu import Locale, Collator
 from operator import itemgetter
 myloc = Locale('el')  # 'el' is the locale code for Greek
 col = Collator.createInstance(myloc)
 list_of_dicts = [{'id': 0,
                   'word': 'ἀγ'},
                  {'id': 1,
                   'word': 'βλα'}, 
                  {'id': 2,
                   'word': 'ὁμηρ'}
                  ]
 sorted_dicts = sorted(list_of_dicts, 
                       key=itemgetter('word'), 
                       cmp=col.compare)

It’s pretty powerful once you figure it out! One nice thing about PyICU, too, is that in Python 2 it doesn’t care whether the values it collates are Python strings (str) or unicode objects. It accommodates both and assumes that strings will be encoded with UTF-8 unicode encoding.

2 replies on “Python Programming: Proper Alphabetical Sorting for Polytonic Greek

Join the conversation

This site uses Akismet to reduce spam. Learn how your comment data is processed.